Introduction to Response Time in APIs

Get to know the different time factors affecting API performance.

Motivation#

Most modern applications are data oriented. These applications process data and present it to the users in user-friendly formats. Especially when we talk about dynamic applications, the data continuously updates. A server stores and serves the continuously updating information whenever requested by connected devices or clients. We concern ourselves primarily with the Internet in this chapter because that is a common way for customers to request services via APIs.

At the API design level, we must establish API SLAs that are realistically achievable using current technology and our cost budget. For example, for voice calls over the Internet, one-way latency of more than 100 ms will start deteriorating the listener’s experience. So, in this case, we (as API and back-end designers) would have some threshold to target for. Now, we need to carefully see, from end to end (from client to the service), how we’ll design to meet the goal (latency in the case of voice over the internet) and, if it’s not possible, how we’ll mitigate it.

Over the years, major services like Google Search and others have set high expectations for customers in general. API designers can’t ignore such customer expectations, or their app might fail because no one wants to use a slow app. The following questions, if answered properly, result in an effective customer experience:

  • How quickly is the API acting on requests and sending responses back?

  • How does the increasing number of requests affect the performance of an API?

Depending on the required operations, different APIs may have varying latencies. These APIs access different types of memory to save or retrieve information, which also takes time. We’ll take help from the standard numbers given in the table below to derive our calculations.

Standard Latency Numbers

Operations

Time

  • CPU registers access time

0.5ns

  • L1 cache access

0.9ns

  • L2 cache access

2.8ns

  • L3 cache access

10ns–100ns

  • Reading 1 MB from memory

9μs

  • SSD write latency
  • Round trip in the same data center takes around 500 μs

100μs–1000μs

  • Read 1MB sequentially from disk takes 2 ms
  • Disk seek time is 4 ms
  • Intra-zone network latency takes around 5 ms


1ms–10ms

  • The network round trip between two zones (inter-zone)
  • Reading 1 GB of sequential data from memory on the same server

10ms–100ms

  • Password hashing algorithm
  • TLS handshake takes 250 ms–500 ms
  • The network round trip between the two regions
  • Reading 1 GB data sequentially from SSD on the same server


100ms–1000ms

  • In 2023, a typical across-continent (zone-zone) latency is around 8 seconds for transferring 1 GB of data assuming a 1 Gbps network

>1s

Note: A region is referred to as a geographical location, a zone is an isolated location within a region, and a data center is the physical existence of resources in a zone. A region can have multiple zones, and a zone can have multiple data centers in it. Moreover, intra-zone communication is referred to as communication between two data centers, and inter-zone communication is referred to as communication between two zones within a region.

Latency vs. response time#

Sending a request and getting a response back from the server takes some time. This time should be as low as possible to minimize the user-perceived latency for a better user experience. Measuring this time is critical in monitoring API performance, which leads to customer satisfaction. We measure it in the following two steps:

  • Latency (network latency) is the propagation time of a message (request and response) between the client and server, excluding the processing time.

  • Processing time is the time a server takes to process a request, including query execution, computation, file handling, and so on.

API response time is the time an API takes to respond to a request. It includes both network latency and processing time. It begins as the request starts and ends when the client receives the response. Although some references may use latency and response times interchangeably, they measure two entirely different time frames. The illustration given below shows the key difference between the two.

Latency vs. response time
Latency vs. response time

To summarize the discussion above, let’s take a look at the equation below:

Timeresponse=Timelatency+TimeprocessingTime_{response} = Time_{latency} + Time_{processing}

The equation above shows that a smaller response time requires us to lower the latency, processing time, or both. Latency depends upon various factors, as discussed here, such as the distance between communicating machines and intermediate network components, for example, caches or proxy servers. The pinging service generally measures the latency of an API endpoint. However, a request to ping a server takes less time (response time) than a request that requires retrieving some data or files from the database.

Note: The appropriate response time depends on specific use cases. In general, an API is considered effective if it has an average response time between 0.1 and 1 second. For example, for a multiplayer online gaming service, a response time of 500 ms would not be optimal.

Factors affecting response time#

In this section, we’ll look at the different factors affecting the response time of APIs. Some of the factors that can affect the processing time of a server are defined in the table at the beginning of the lesson. Let’s look at the other key factors in calculating the response time.

Let’s suppose the client/browser does not know the server's IP address. So, when a request is forwarded from the client, it first goes to the DNS server. After getting the requested IP address, the client performs the TCP handshake. When it receives the acknowledgment, the server sends the SSL/TLS certificate to the client to create a secure channel. Next, the HTTP request is forwarded to the destination server. The server processes the request by performing the required operations and stores or retrieves the data in or from the database. Finally, the client gets the response, indicating the request was successful, and the data is stored or retrieved in or from the database.

The following illustration gives an overview of the events that occur during an API request and response.

The connection and communication structure between a client and a server
The connection and communication structure between a client and a server

We break down the response time into the following segments, as depicted in the illustration:

  • DNS lookup is the time to resolve the IP address against a domain name through the DNS server.

  • TCP handshake is the time to establish an initial connection between the client and server.

  • SSL/TLS handshake is the time to create a secure communication channel for data exchange.

  • Transfer start is the time to acquire the first byte of the requested data in the response message. It includes both the RTTget/postRTT_{get/post} (round trip time of GET/POST) messages and processing time at the server end.

Transfer start =RTTget/post+ TimeprocessingTransfer\ start \ = RTT_{get/post} +\ Time_{processing}
  • Download is the time taken by a client to fetch the complete data.

Note: The DNS lookup, TCP handshake, and SSL/TLS handshake times will generally be called base time for future references in the course.

We can obtain the latency of a request by excluding the server’s processing time using the following general equation:

Timelatency=Timebase+RTTget/post+TimeDownloadTime_{latency}= Time_{base} + RTT_{get/post}+ Time_{Download}

RTTget/postRTT_{get/post} includes the message propagation time to and from the server.

Quiz

Question

Let’s suppose the client is accessing a service from different regions. Will the response time be the same in different regions for a particular server?

Hide Answer

The response time can vary greatly depending on the region in which the requests are made. A user who initiates a request near the data center may receive a response quicker than someone far away. Therefore, to keep the service response time within an acceptable range, replication of servers across the world is essential.

In the coming lessons, we’ll calculate the response time of an API by estimating the latency and processing time.

Quiz on Important API Concepts - II

Estimation of Processing Time of an API